Skip to content

[CHORE] Make foundation api /init use correct schema, index, dim#7127

Merged
HammadB merged 4 commits into
mainfrom
hammad/foundation_api
May 26, 2026
Merged

[CHORE] Make foundation api /init use correct schema, index, dim#7127
HammadB merged 4 commits into
mainfrom
hammad/foundation_api

Conversation

@HammadB
Copy link
Copy Markdown
Collaborator

@HammadB HammadB commented May 25, 2026

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • init now uses correct schema, index and dim based on foundation settings.
  • New functionality
    • /

Test plan

How are these changes tested?
Added tests for new basic items of schema/collection creation/planning. The hardcoding of 1024 dim is untested.

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

None needed

Observability plan

No addtl needed

Documentation Changes

None

@github-actions
Copy link
Copy Markdown

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@blacksmith-sh

This comment has been minimized.

@HammadB HammadB requested a review from davedash May 25, 2026 19:17
@HammadB HammadB merged commit 8079330 into main May 26, 2026
117 of 121 checks passed
LLay added a commit that referenced this pull request May 26, 2026
Mirrors the CLI POC (chroma-core/foundation #97): after ensuring each
source collection, /init attaches the server-side function via
SysDb::create_attached_function, with the wiki collection as output.

- Attachment name `{source}_to_wiki`; operator `http_generate`
  (configurable); params carry the modal `endpoint_url`,
  `source_collection`, and `source_kind`.
- New FoundationConfig fields: function_name, function_endpoint_url,
  min_records_for_invocation (defaults mirror the POC + the chroma
  frontend's 100-record default). Output dimension is already hardcoded
  to 1024 in /init (chroma #7127), so no seed_output_collection step is
  needed.
- Idempotent: AlreadyExists / CollectionAlreadyHasFunction are treated
  as success so /init stays safe to call repeatedly.
LLay added a commit that referenced this pull request May 27, 2026
Mirrors the CLI POC (chroma-core/foundation #97): after ensuring each
source collection, /init attaches the server-side function via
SysDb::create_attached_function, with the wiki collection as output.

- Attachment name `{source}_to_wiki`; operator `http_generate`
  (configurable); params carry the modal `endpoint_url`,
  `source_collection`, and `source_kind`.
- New FoundationConfig fields: function_name, function_endpoint_url,
  min_records_for_invocation (defaults mirror the POC + the chroma
  frontend's 100-record default). Output dimension is already hardcoded
  to 1024 in /init (chroma #7127), so no seed_output_collection step is
  needed.
- Idempotent: AlreadyExists / CollectionAlreadyHasFunction are treated
  as success so /init stays safe to call repeatedly.
LLay added a commit that referenced this pull request May 28, 2026
…7134)

> **Draft.** Stacked on #7133 (the worker change that *reads* the
chunk-sibling flag). Merge #7133 first, then rebase this onto main.

## Summary

Extends the foundation `/init` endpoint to mirror the CLI POC
(chroma-core/foundation #97), so `/init` is the single bootstrap for a
team's foundation workspace. On top of the existing wiki +
wiki_revisions creation, `/init` now:

1. **Ensures the source collections** (slack, notion; configurable via
`CHROMA_FOUNDATION__SOURCE_COLLECTIONS`).
2. **Sets `chroma:group_chunk_siblings = true`** on each source
collection so the worker's `PartitionOperator` (#7133) keeps a job's
chunk records in one partition — the ordering the end-of-job marker
relies on (ADR 0001 §6).
3. **Attaches the foundation function** to each source collection via
`SysDb::create_attached_function`, with the wiki collection as output —
the server-side equivalent of the POC's HTTP attach.

## Function attach (mirrors POC #97)

- Attachment name `{source}_to_wiki`; operator `http_generate`
(configurable).
- `params`: `{ endpoint_url, source_collection, source_kind }` —
`endpoint_url` defaults to the modal URL from the POC;
`source_collection`/`source_kind` are the source name.
- `min_records_for_invocation` defaults to 100 (matches the chroma
frontend default).
- **No `seed_output_collection` step** — per @HammadB, the output
dimension is already hardcoded to 1024 in `/init`'s collection creation
(chroma #7127, already on main).
- **Idempotent**: `AlreadyExists` / `CollectionAlreadyHasFunction` are
treated as success, so `/init` stays safe to call repeatedly.

## Shared constant

The chunk-sibling flag key is promoted to
`chroma_types::CHROMA_GROUP_CHUNK_SIBLINGS_KEY` so the reader (worker
`PartitionOperator`) and writer (`/init`) share one definition;
`partition_log.rs` re-exports it.

## Wiki collections deliberately untouched

Wiki/wiki_revisions are the function's *output* — no chunk-sibling flag,
no attach. The marker mechanism operates on the source/input side.

## Caveat: get-or-create idempotency

`/init` uses get-or-create for collections. If a source collection
**already exists** without the flag (e.g. created by an earlier upload),
the metadata isn't retroactively updated. `/init` must run before the
first upload (it's the bootstrap). The function attach is independently
idempotent. Pre-existing source collections would need a one-off
metadata backfill — out of scope here.

## Test plan

- [x] `cargo check -p foundation-api`, `cargo check -p chroma-types`
pass.
- [x] `cargo test -p foundation-api --lib routes::init` — 4/4 pass.
- [ ] **CI must run the worker suite** — the `partition_log.rs`
re-export couldn't be compiled locally (Homebrew rustc 1.94.1 vs pinned
1.92.0; `wal3` fails under 1.94.1 independent of this change).
- [ ] End-to-end (post-#7133-merge): `/init` → source collections carry
the flag + have `http_generate` attached → uploads chunk into them →
attached function runs and observes the end-of-job marker last.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants